Monsey
Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings
Hanley, Hans W. A., Durumeric, Zakir
Contextual large language model embeddings are increasingly utilized for topic modeling and clustering. However, current methods often scale poorly, rely on opaque similarity metrics, and struggle in multilingual settings. In this work, we present a novel, scalable, interpretable, hierarchical, and multilingual approach to clustering news articles and social media data. To do this, we first train multilingual Matryoshka embeddings that can determine story similarity at varying levels of granularity based on which subset of the dimensions of the embeddings is examined. This embedding model achieves state-of-the-art performance on the SemEval 2022 Task 8 test dataset (Pearson $ρ$ = 0.816). Once trained, we develop an efficient hierarchical clustering algorithm that leverages the hierarchical nature of Matryoshka embeddings to identify unique news stories, narratives, and themes. We conclude by illustrating how our approach can identify and cluster stories, narratives, and overarching themes within real-world news datasets.
- Asia > North Korea (0.28)
- Europe > Ukraine (0.14)
- Asia > Russia (0.14)
- (14 more...)
- Research Report (1.00)
- Overview (0.68)
- Media > News (1.00)
- Information Technology (1.00)
- Government > Foreign Policy (0.93)
- (4 more...)
AI-based Identity Fraud Detection: A Systematic Review
Zhang, Chuo Jun, Gill, Asif Q., Liu, Bo, Anwar, Memoona J.
With the rapid development of digital services, a large volume of personally identifiable information (PII) is stored online and is subject to cyberattacks such as Identity fraud. Most recently, the use of Artificial Intelligence (AI) enabled deep fake technologies has significantly increased the complexity of identity fraud. Fraudsters may use these technologies to create highly sophisticated counterfeit personal identification documents, photos and videos. These advancements in the identity fraud landscape pose challenges for identity fraud detection and society at large. There is a pressing need to review and understand identity fraud detection methods, their limitations and potential solutions. This research aims to address this important need by using the well-known systematic literature review method. This paper reviewed a selected set of 43 papers across 4 major academic literature databases. In particular, the review results highlight the two types of identity fraud prevention and detection methods, in-depth and open challenges. The results were also consolidated into a taxonomy of AI-based identity fraud detection and prevention methods including key insights and trends. Overall, this paper provides a foundational knowledge base to researchers and practitioners for further research and development in this important area of digital identity fraud.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > New York > Rockland County > Monsey (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)